11 itertools迭代器工具

写代码经常要处理各种循环——多个列表合并遍历、取所有组合、按条件分组、分批处理数据……用普通的for循环写起来又长又慢。itertools模块就是干这个的，它提供了一组"迭代器构建块"，用C语言实现，速度比纯Python快得多。

itertools的所有函数都返回迭代器，支持惰性求值——数据不是一次性生成的，而是用到哪生成到哪，内存效率极高。

一、串联迭代器

1.1 chain()

把多个可迭代对象串成一个。

python

from itertools import chain

list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [7, 8, 9]

# 串联多个列表
result = list(chain(list1, list2, list3))
print(result)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

如果要串联的列表很多，用chain.from_iterable()：

python

from itertools import chain

lists = [[1, 2], [3, 4], [5, 6]]

# 从嵌套列表中提取
result = list(chain.from_iterable(lists))
print(result)  # [1, 2, 3, 4, 5, 6]

1.2 zip_longest()

以最长的可迭代对象为准进行zip，短的用填充值补齐。

python

from itertools import zip_longest

a = [1, 2, 3]
b = ['a', 'b']

# 普通zip以短的为准
list(zip(a, b))          # [(1, 'a'), (2, 'b')]

# zip_longest以长的为准
list(zip_longest(a, b, fillvalue='-'))
# [(1, 'a'), (2, 'b'), (3, '-')]

二、组合与排列

2.1 combinations()

返回所有长度为r的组合（不考虑顺序，不重复）。

python

from itertools import combinations

items = ['A', 'B', 'C', 'D']

# 取2个的所有组合
list(combinations(items, 2))
# [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]

# 取3个的所有组合
list(combinations(items, 3))
# [('A', 'B', 'C'), ('A', 'B', 'D'), ('A', 'C', 'D'), ('B', 'C', 'D')]

2.2 combinations_with_replacement()

允许重复元素的组合。

python

from itertools import combinations_with_replacement

items = ['A', 'B', 'C']

list(combinations_with_replacement(items, 2))
# [('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'B'), ('B', 'C'), ('C', 'C')]

2.3 permutations()

返回所有排列（考虑顺序）。

python

from itertools import permutations

items = ['A', 'B', 'C']

# 所有2元素排列
list(permutations(items, 2))
# [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

# 所有排列（默认取全部）
list(permutations(items))
# [('A', 'B', 'C'), ('A', 'C', 'B'), ('B', 'A', 'C'), ...]

2.4 product()

笛卡尔积，相当于多个for循环嵌套。

python

from itertools import product

colors = ['红', '蓝']
sizes = ['S', 'M', 'L']

# 笛卡尔积
list(product(colors, sizes))
# [('红', 'S'), ('红', 'M'), ('红', 'L'), ('蓝', 'S'), ('蓝', 'M'), ('蓝', 'L')]

# 自身的笛卡尔积（repeat参数）
list(product([0, 1], repeat=3))
# [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), ...]

三、分组与切片

3.1 groupby()

按key对连续元素进行分组。注意：数据必须先排序才能正确分组。

python

from itertools import groupby

data = [
    {"type": "fruit", "name": "apple"},
    {"type": "fruit", "name": "banana"},
    {"type": "veg", "name": "carrot"},
    {"type": "veg", "name": "potato"},
]

# 按type分组
for key, group in groupby(data, key=lambda x: x["type"]):
    items = list(group)
    print(f"{key}: {[x['name'] for x in items]}")
# fruit: ['apple', 'banana']
# veg: ['carrot', 'potato']

3.2 islice()

迭代器切片，不能用普通的[start:stop]语法。

python

from itertools import islice

data = iter([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# 取前3个
list(islice(data, 3))  # [0, 1, 2]

# 从第2个取到第5个
data = iter([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
list(islice(data, 2, 5))  # [2, 3, 4]

# 带步长
data = iter([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
list(islice(data, 0, 10, 2))  # [0, 2, 4, 6, 8]

3.3 batched()

将可迭代对象分成固定大小的批次（3.12+）。

python

from itertools import batched

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# 分成每批3个
list(batched(data, 3))
# [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

# strict=True时，最后一批不足会报错
list(batched(data, 3, strict=True))
# ValueError: batched(): incomplete batch

四、累积与折叠

4.1 accumulate()

累积计算，默认是累加。

python

from itertools import accumulate

data = [1, 2, 3, 4, 5]

# 累加
list(accumulate(data))  # [1, 3, 6, 10, 15]

# 累乘
from operator import mul
list(accumulate(data, mul))  # [1, 2, 6, 24, 120]

# 求最大值
list(accumulate(data, max))  # [1, 2, 3, 4, 5]

# 设置初始值
list(accumulate(data, initial=100))  # [100, 101, 103, 106, 110, 115]

五、无限迭代器

5.1 count()

无限计数器。

python

from itertools import count

# 从0开始，步长为1
for i in count():
    if i > 5:
        break
    print(i)  # 0, 1, 2, 3, 4, 5

# 从10开始，步长为2
for i in count(10, 2):
    if i > 20:
        break
    print(i)  # 10, 12, 14, 16, 18, 20

5.2 cycle()

无限循环迭代。

python

from itertools import cycle

colors = ['red', 'green', 'blue']

# 无限循环
counter = 0
for color in cycle(colors):
    if counter > 5:
        break
    print(color)
    counter += 1
# red, green, blue, red, green, blue

5.3 repeat()

重复对象指定次数或无限重复。

python

from itertools import repeat

# 重复5次
list(repeat('hello', 5))
# ['hello', 'hello', 'hello', 'hello', 'hello']

# 常用于map
list(map(pow, range(5), repeat(2)))
# [0, 1, 4, 9, 16]

六、过滤与选择

6.1 filterfalse()

与filter()相反，保留使函数返回False的元素。

python

from itertools import filterfalse

data = [1, 2, 3, 4, 5, 6, 7, 8]

# 保留奇数（过滤掉偶数）
list(filterfalse(lambda x: x % 2 == 0, data))
# [1, 3, 5, 7]

6.2 dropwhile() / takewhile()

条件丢弃/保留。

python

from itertools import dropwhile, takewhile

data = [1, 3, 5, 2, 4, 6]

# 丢弃开头满足条件的元素，然后返回剩余所有
list(dropwhile(lambda x: x < 4, data))
# [5, 2, 4, 6]

# 保留开头满足条件的元素，遇到不满足的就停止
list(takewhile(lambda x: x < 4, data))
# [1, 3]

6.3 compress()

按选择器过滤。

python

from itertools import compress

data = ['A', 'B', 'C', 'D', 'E']
selectors = [1, 0, 1, 0, 1]

list(compress(data, selectors))
# ['A', 'C', 'E']

七、其他实用工具

7.1 pairwise()

相邻元素配对（3.10+）。

python

from itertools import pairwise

list(pairwise([1, 2, 3, 4, 5]))
# [(1, 2), (2, 3), (3, 4), (4, 5)]

7.2 starmap()

类似map()，但会对参数进行解包。

python

from itertools import starmap

pairs = [(2, 3), (3, 4), (4, 5)]

# map需要lambda
list(map(lambda p: p[0] ** p[1], pairs))  # [8, 81, 1024]

# starmap直接解包
list(starmap(pow, pairs))  # [8, 81, 1024]

7.3 tee()

复制迭代器。

python

from itertools import tee

data = iter([1, 2, 3, 4, 5])

# 复制为3个独立的迭代器
it1, it2, it3 = tee(data, 3)

list(it1)  # [1, 2, 3, 4, 5]
list(it2)  # [1, 2, 3, 4, 5]
list(it3)  # [1, 2, 3, 4, 5]

八、实用场景

8.1 批量处理数据

python

from itertools import batched

def process_batch(batch):
    print(f"处理: {batch}")

data = list(range(10))

# 分批处理
for batch in batched(data, 3):
    process_batch(batch)
# 处理: (0, 1, 2)
# 处理: (3, 4, 5)
# 处理: (6, 7, 8)
# 处理: (9,)

8.2 展平嵌套列表

python

from itertools import chain

nested = [[1, 2], [3, 4], [5, 6]]
flat = list(chain.from_iterable(nested))
print(flat)  # [1, 2, 3, 4, 5, 6]

8.3 生成测试数据

python

from itertools import product, cycle

# 生成所有测试组合
test_cases = list(product(
    ['user', 'admin'],
    ['read', 'write', 'delete'],
    [True, False]
))

九、总结

itertools的核心函数：

分类	函数
串联	`chain()`, `chain.from_iterable()`
组合排列	`combinations()`, `permutations()`, `product()`
分组切片	`groupby()`, `islice()`, `batched()`
累积	`accumulate()`
无限迭代	`count()`, `cycle()`, `repeat()`
过滤	`filterfalse()`, `dropwhile()`, `takewhile()`, `compress()`
其他	`pairwise()`, `starmap()`, `zip_longest()`, `tee()`

itertools是写高性能Python代码的利器，特别是处理大数据量时，用迭代器代替列表能节省大量内存。

11 itertools迭代器工具 ​

一、串联迭代器 ​

1.1 chain() ​

1.2 zip_longest() ​

二、组合与排列 ​

2.1 combinations() ​

2.2 combinations_with_replacement() ​

2.3 permutations() ​

2.4 product() ​

三、分组与切片 ​

3.1 groupby() ​

3.2 islice() ​

3.3 batched() ​

四、累积与折叠 ​

4.1 accumulate() ​

五、无限迭代器 ​

5.1 count() ​

5.2 cycle() ​

5.3 repeat() ​

六、过滤与选择 ​

6.1 filterfalse() ​

6.2 dropwhile() / takewhile() ​

6.3 compress() ​

七、其他实用工具 ​

7.1 pairwise() ​

7.2 starmap() ​

7.3 tee() ​

八、实用场景 ​

8.1 批量处理数据 ​

8.2 展平嵌套列表 ​

8.3 生成测试数据 ​

九、总结 ​

11 itertools迭代器工具

一、串联迭代器

1.1 chain()

1.2 zip_longest()

二、组合与排列

2.1 combinations()

2.2 combinations_with_replacement()

2.3 permutations()

2.4 product()

三、分组与切片

3.1 groupby()

3.2 islice()

3.3 batched()

四、累积与折叠

4.1 accumulate()

五、无限迭代器

5.1 count()

5.2 cycle()

5.3 repeat()

六、过滤与选择

6.1 filterfalse()

6.2 dropwhile() / takewhile()

6.3 compress()

七、其他实用工具

7.1 pairwise()

7.2 starmap()

7.3 tee()

八、实用场景

8.1 批量处理数据

8.2 展平嵌套列表

8.3 生成测试数据

九、总结